50 research outputs found

    OpenTED Browser: Insights into European Public Spendings

    Full text link
    We present the OpenTED browser, a Web application allowing to interactively browse public spending data related to public procurements in the European Union. The application relies on Open Data recently published by the European Commission and the Publications Office of the European Union, from which we imported a curated dataset of 4.2 million contract award notices spanning the period 2006-2015. The application is designed to easily filter notices and visualise relationships between public contracting authorities and private contractors. The simple design allows for example to quickly find information about who the biggest suppliers of local governments are, and the nature of the contracted goods and services. We believe the tool, which we make Open Source, is a valuable source of information for journalists, NGOs, analysts and citizens for getting information on public procurement data, from large scale trends to local municipal developments.Comment: ECML, PKDD, SoGood workshop 201

    Feature selection in high-dimensional dataset using MapReduce

    Full text link
    This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features

    Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection: Assessment and Visualization

    Full text link
    Credit card fraud detection is a very challenging problem because of the specific nature of transaction data and the labeling process. The transaction data is peculiar because they are obtained in a streaming fashion, they are strongly imbalanced and prone to non-stationarity. The labeling is the outcome of an active learning process, as every day human investigators contact only a small number of cardholders (associated to the riskiest transactions) and obtain the class (fraud or genuine) of the related transactions. An adequate selection of the set of cardholders is therefore crucial for an efficient fraud detection process. In this paper, we present a number of active learning strategies and we investigate their fraud detection accuracies. We compare different criteria (supervised, semi-supervised and unsupervised) to query unlabeled transactions. Finally, we highlight the existence of an exploitation/exploration trade-off for active learning in the context of fraud detection, which has so far been overlooked in the literature

    Transfer Learning Strategies for Credit Card Fraud Detection.

    Get PDF
    Credit card fraud jeopardizes the trust of customers in e-commerce transactions. This led in recent years to major advances in the design of automatic Fraud Detection Systems (FDS) able to detect fraudulent transactions with short reaction time and high precision. Nevertheless, the heterogeneous nature of the fraud behavior makes it difficult to tailor existing systems to different contexts (e.g. new payment systems, different countries and/or population segments). Given the high cost (research, prototype development, and implementation in production) of designing data-driven FDSs, it is crucial for transactional companies to define procedures able to adapt existing pipelines to new challenges. From an AI/machine learning perspective, this is known as the problem of transfer learning. This paper discusses the design and implementation of transfer learning approaches for e-commerce credit card fraud detection and their assessment in a real setting. The case study, based on a six-month dataset (more than 200 million e-commerce transactions) provided by the industrial partner, relates to the transfer of detection models developed for a European country to another country. In particular, we present and discuss 15 transfer learning techniques (ranging from naive baselines to state-of-the-art and new approaches), making a critical and quantitative comparison in terms of precision for different transfer scenarios. Our contributions are twofold: (i) we show that the accuracy of many transfer methods is strongly dependent on the number of labeled samples in the target domain and (ii) we propose an ensemble solution to this problem based on self-supervised and semi-supervised domain adaptation classifiers. The thorough experimental assessment shows that this solution is both highly accurate and hardly sensitive to the number of labeled samples

    Use Scenarios & Practical Examples of AI Use in Education

    Full text link
    This report presents a set of use scenarios based on existing resources that teachers can use as inspiration to create their own, with the aim of introducing artificial intelligence (AI) at different pre-university levels, and with different goals. The Artificial Intelligence Education field (AIEd) is very active, with new resources and tools arising continuously. Those included in this document have already been tested with students and selected by experts in the field, but they must be taken just as practical examples to guide and inspire teachers creativity.Comment: Developed within the AI in Education working group of the European Digital Education Hu

    Transfer learning for credit card fraud detection : A journey from research to production.

    Get PDF
    The dark face of digital commerce generalization is the increase of fraud attempts. To prevent any type of attacks, state-of-the-art fraud detection systems are now embedding Machine Learning (ML) modules. The conception of such modules is only communicated at the level of research and papers mostly focus on results for isolated benchmark datasets and metrics. But research is only a part of the journey, preceded by the right formulation of the business problem and collection of data, and followed by a practical integration. In this paper, we give a wider vision of the process, on a case study of transfer learning for fraud detection, from business to research, and back to business

    Transfer learning for credit card fraud detection : A journey from research to production.

    Get PDF
    The dark face of digital commerce generalization is the increase of fraud attempts. To prevent any type of attacks, state-of-the-art fraud detection systems are now embedding Machine Learning (ML) modules. The conception of such modules is only communicated at the level of research and papers mostly focus on results for isolated benchmark datasets and metrics. But research is only a part of the journey, preceded by the right formulation of the business problem and collection of data, and followed by a practical integration. In this paper, we give a wider vision of the process, on a case study of transfer learning for fraud detection, from business to research, and back to business

    Apprentissage dans les réseaux de capteurs pour une surveillance environnementale moins coûteuse en énergie

    No full text
    Wireless sensor networks form an emerging class of computing devices capable of observing the world with an unprecedented resolution, and promise to provide a revolutionary instrument for environmental monitoring. Such a network is composed of a collection of battery-operated wireless sensors, or sensor nodes, each of which is equipped with sensing, processing and wireless communication capabilities. Thanks to advances in microelectronics and wireless technologies, wireless sensors are small in size, and can be deployed at low cost over different kinds of environments in order to monitor both over space and time the variations of physical quantities such as temperature, humidity, light, or sound. In environmental monitoring studies, many applications are expected to run unattended for months or years. Sensor nodes are however constrained by limited resources, particularly in terms of energy. Since communication is one order of magnitude more energy-consuming than processing, the design of data collection schemes that limit the amount of transmitted data is therefore recognized as a central issue for wireless sensor networks.An efficient way to address this challenge is to approximate, by means of mathematical models, the evolution of the measurements taken by sensors over space and/or time. Indeed, whenever a mathematical model may be used in place of the true measurements, significant gains in communications may be obtained by only transmitting the parameters of the model instead of the set of real measurements. Since in most cases there is little or no a priori information about the variations taken by sensor measurements, the models must be identified in an automated manner. This calls for the use of machine learning techniques, which allow to model the variations of future measurements on the basis of past measurements.This thesis brings two main contributions to the use of learning techniques in a sensor network. First, we propose an approach which combines time series prediction and model selection for reducing the amount of communication. The rationale of this approach, called adaptive model selection, is to let the sensors determine in an automated manner a prediction model that does not only fits their measurements, but that also reduces the amount of transmitted data. The second main contribution is the design of a distributed approach for modeling sensed data, based on the principal component analysis (PCA). The proposed method allows to transform along a routing tree the measurements taken in such a way that (i) most of the variability in the measurements is retained, and (ii) the network load sustained by sensor nodes is reduced and more evenly distributed, which in turn extends the overall network lifetime. The framework can be seen as a truly distributed approach for the principal component analysis, and finds applications not only for approximated data collection tasks, but also for event detection or recognition tasks. /Les réseaux de capteurs sans fil forment une nouvelle famille de systèmes informatiques permettant d'observer le monde avec une résolution sans précédent. En particulier, ces systèmes promettent de révolutionner le domaine de l'étude environnementale. Un tel réseau est composé d'un ensemble de capteurs sans fil, ou unités sensorielles, capables de collecter, traiter, et transmettre de l'information. Grâce aux avancées dans les domaines de la microélectronique et des technologies sans fil, ces systèmes sont à la fois peu volumineux et peu coûteux. Ceci permet leurs deploiements dans différents types d'environnements, afin d'observer l'évolution dans le temps et l'espace de quantités physiques telles que la température, l'humidité, la lumière ou le son.Dans le domaine de l'étude environnementale, les systèmes de prise de mesures doivent souvent fonctionner de manière autonome pendant plusieurs mois ou plusieurs années. Les capteurs sans fil ont cependant des ressources limitées, particulièrement en terme d'énergie. Les communications radios étant d'un ordre de grandeur plus coûteuses en énergie que l'utilisation du processeur, la conception de méthodes de collecte de données limitant la transmission de données est devenue l'un des principaux défis soulevés par cette technologie. Ce défi peut être abordé de manière efficace par l'utilisation de modèles mathématiques modélisant l'évolution spatiotemporelle des mesures prises par les capteurs. En effet, si un tel modèle peut être utilisé à la place des mesures, d'importants gains en communications peuvent être obtenus en utilisant les paramètres du modèle comme substitut des mesures. Cependant, dans la majorité des cas, peu ou aucune information sur la nature des mesures prises par les capteurs ne sont disponibles, et donc aucun modèle ne peut être a priori défini. Dans ces cas, les techniques issues du domaine de l'apprentissage machine sont particulièrement appropriées. Ces techniques ont pour but de créer ces modèles de façon autonome, en anticipant les mesures à venir sur la base des mesures passées. Dans cette thèse, deux contributions sont principalement apportées permettant l'applica-tion de techniques d'apprentissage machine dans le domaine des réseaux de capteurs sans fil. Premièrement, nous proposons une approche qui combine la prédiction de série temporelle avec la sélection de modèles afin de réduire la communication. La logique de cette approche, appelée sélection de modèle adaptive, est de permettre aux unités sensorielles de determiner de manière autonome un modèle de prédiction qui anticipe correctement leurs mesures, tout en réduisant l'utilisation de leur radio.Deuxièmement, nous avons conçu une méthode permettant de modéliser de façon distribuée les mesures collectées, qui se base sur l'analyse en composantes principales (ACP). La méthode permet de transformer les mesures le long d'un arbre de routage, de façon à ce que (i) la majeure partie des variations dans les mesures des capteurs soient conservées, et (ii) la charge réseau soit réduite et mieux distribuée, ce qui permet d'augmenter également la durée de vie du réseau. L'approche proposée permet de véritablement distribuer l'ACP, et peut être utilisée pour des applications impliquant la collecte de données, mais également pour la détection ou la classification d'événements. Doctorat en Sciencesinfo:eu-repo/semantics/nonPublishe

    Apprentissage dans les réseaux de capteurs pour une surveillance environnementale moins coûteuse en énergie

    No full text
    Wireless sensor networks form an emerging class of computing devices capable of observing the world with an unprecedented resolution, and promise to provide a revolutionary instrument for environmental monitoring. Such a network is composed of a collection of battery-operated wireless sensors, or sensor nodes, each of which is equipped with sensing, processing and wireless communication capabilities. Thanks to advances in microelectronics and wireless technologies, wireless sensors are small in size, and can be deployed at low cost over different kinds of environments in order to monitor both over space and time the variations of physical quantities such as temperature, humidity, light, or sound. In environmental monitoring studies, many applications are expected to run unattended for months or years. Sensor nodes are however constrained by limited resources, particularly in terms of energy. Since communication is one order of magnitude more energy-consuming than processing, the design of data collection schemes that limit the amount of transmitted data is therefore recognized as a central issue for wireless sensor networks.An efficient way to address this challenge is to approximate, by means of mathematical models, the evolution of the measurements taken by sensors over space and/or time. Indeed, whenever a mathematical model may be used in place of the true measurements, significant gains in communications may be obtained by only transmitting the parameters of the model instead of the set of real measurements. Since in most cases there is little or no a priori information about the variations taken by sensor measurements, the models must be identified in an automated manner. This calls for the use of machine learning techniques, which allow to model the variations of future measurements on the basis of past measurements.This thesis brings two main contributions to the use of learning techniques in a sensor network. First, we propose an approach which combines time series prediction and model selection for reducing the amount of communication. The rationale of this approach, called adaptive model selection, is to let the sensors determine in an automated manner a prediction model that does not only fits their measurements, but that also reduces the amount of transmitted data. The second main contribution is the design of a distributed approach for modeling sensed data, based on the principal component analysis (PCA). The proposed method allows to transform along a routing tree the measurements taken in such a way that (i) most of the variability in the measurements is retained, and (ii) the network load sustained by sensor nodes is reduced and more evenly distributed, which in turn extends the overall network lifetime. The framework can be seen as a truly distributed approach for the principal component analysis, and finds applications not only for approximated data collection tasks, but also for event detection or recognition tasks. /Les réseaux de capteurs sans fil forment une nouvelle famille de systèmes informatiques permettant d'observer le monde avec une résolution sans précédent. En particulier, ces systèmes promettent de révolutionner le domaine de l'étude environnementale. Un tel réseau est composé d'un ensemble de capteurs sans fil, ou unités sensorielles, capables de collecter, traiter, et transmettre de l'information. Grâce aux avancées dans les domaines de la microélectronique et des technologies sans fil, ces systèmes sont à la fois peu volumineux et peu coûteux. Ceci permet leurs deploiements dans différents types d'environnements, afin d'observer l'évolution dans le temps et l'espace de quantités physiques telles que la température, l'humidité, la lumière ou le son.Dans le domaine de l'étude environnementale, les systèmes de prise de mesures doivent souvent fonctionner de manière autonome pendant plusieurs mois ou plusieurs années. Les capteurs sans fil ont cependant des ressources limitées, particulièrement en terme d'énergie. Les communications radios étant d'un ordre de grandeur plus coûteuses en énergie que l'utilisation du processeur, la conception de méthodes de collecte de données limitant la transmission de données est devenue l'un des principaux défis soulevés par cette technologie. Ce défi peut être abordé de manière efficace par l'utilisation de modèles mathématiques modélisant l'évolution spatiotemporelle des mesures prises par les capteurs. En effet, si un tel modèle peut être utilisé à la place des mesures, d'importants gains en communications peuvent être obtenus en utilisant les paramètres du modèle comme substitut des mesures. Cependant, dans la majorité des cas, peu ou aucune information sur la nature des mesures prises par les capteurs ne sont disponibles, et donc aucun modèle ne peut être a priori défini. Dans ces cas, les techniques issues du domaine de l'apprentissage machine sont particulièrement appropriées. Ces techniques ont pour but de créer ces modèles de façon autonome, en anticipant les mesures à venir sur la base des mesures passées. Dans cette thèse, deux contributions sont principalement apportées permettant l'applica-tion de techniques d'apprentissage machine dans le domaine des réseaux de capteurs sans fil. Premièrement, nous proposons une approche qui combine la prédiction de série temporelle avec la sélection de modèles afin de réduire la communication. La logique de cette approche, appelée sélection de modèle adaptive, est de permettre aux unités sensorielles de determiner de manière autonome un modèle de prédiction qui anticipe correctement leurs mesures, tout en réduisant l'utilisation de leur radio.Deuxièmement, nous avons conçu une méthode permettant de modéliser de façon distribuée les mesures collectées, qui se base sur l'analyse en composantes principales (ACP). La méthode permet de transformer les mesures le long d'un arbre de routage, de façon à ce que (i) la majeure partie des variations dans les mesures des capteurs soient conservées, et (ii) la charge réseau soit réduite et mieux distribuée, ce qui permet d'augmenter également la durée de vie du réseau. L'approche proposée permet de véritablement distribuer l'ACP, et peut être utilisée pour des applications impliquant la collecte de données, mais également pour la détection ou la classification d'événements. Doctorat en Sciencesinfo:eu-repo/semantics/nonPublishe

    Unsupervised and supervised compression with principal component analysis in wireless sensor networks

    No full text
    First International Workshop on Knowledge Discovery from Sensor Datainfo:eu-repo/semantics/nonPublishe
    corecore